# Calculating nested aggregates

## Use case

Sometimes, there's a need to calculate a double aggregation over a fact
table. For example, if you have a `line_items` table that has `store_id`,
`order_id`, and `sales` columns, you might wonder what is the median of
_sales per product_ for each store.

With an ad-hoc SQL query, this double aggregation would probably
be expressed as follows:

```sql
WITH sales_per_store_product AS (
  SELECT store_id, product_id, SUM(sales) AS sales
  FROM line_items
  GROUP BY 1, 2
)

SELECT store_id, PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY sales) AS sales_median
FROM sales_per_store_product
GROUP BY 1
``` 

## Data modeling

In Cube, [measures][ref-measures] are used to define aggregates. However,
a single measure can only contain a single aggregation, e.g., `SUM`,
`APPROX_COUNT_DISTINCT`, or `PERCENTILE_CONT`.

If you'd like to define a double aggregation, e.g., a median of a sum of
values, the _outer_ aggregation would need to be defined in a separate
[cube][ref-cube] and the _inner_ aggregation (measure) would need to be
brought to that cube as a [subquery dimension][ref-subquery-dimension].
Also, these cubes would need to have a join definition between them.

Consider the following data model:

```yaml
cubes:
  - name: nested_agg_sales
    sql: >
      SELECT 1 AS id, 1 AS store_id, 1 AS product_id, 10 AS sales UNION ALL
      SELECT 2 AS id, 1 AS store_id, 1 AS product_id, 20 AS sales UNION ALL
      SELECT 3 AS id, 1 AS store_id, 2 AS product_id, 30 AS sales UNION ALL
      SELECT 4 AS id, 1 AS store_id, 2 AS product_id, 40 AS sales UNION ALL
      SELECT 5 AS id, 2 AS store_id, 1 AS product_id, 50 AS sales UNION ALL
      SELECT 6 AS id, 2 AS store_id, 1 AS product_id, 60 AS sales UNION ALL
      SELECT 7 AS id, 2 AS store_id, 2 AS product_id, 70 AS sales UNION ALL
      SELECT 8 AS id, 2 AS store_id, 2 AS product_id, 80 AS sales

    dimensions:
      - name: id
        sql: id
        type: number
        primary_key: true

      - name: store_id
        sql: store_id
        type: number

      - name: product_id
        sql: product_id
        type: number

      - name: store_product_id
        sql: "CONCAT({store_id}, '-', {product_id})"
        type: string

    measures:
      - name: sales
        sql: sales
        type: sum

  - name: nested_agg_stores_orders
    sql: >
      SELECT store_id, product_id
      FROM (
        SELECT 1 AS id, 1 AS store_id, 1 AS product_id, 10 AS sales UNION ALL
        SELECT 2 AS id, 1 AS store_id, 1 AS product_id, 20 AS sales UNION ALL
        SELECT 3 AS id, 1 AS store_id, 2 AS product_id, 30 AS sales UNION ALL
        SELECT 4 AS id, 1 AS store_id, 2 AS product_id, 40 AS sales UNION ALL
        SELECT 5 AS id, 2 AS store_id, 1 AS product_id, 50 AS sales UNION ALL
        SELECT 6 AS id, 2 AS store_id, 1 AS product_id, 60 AS sales UNION ALL
        SELECT 7 AS id, 2 AS store_id, 2 AS product_id, 70 AS sales UNION ALL
        SELECT 8 AS id, 2 AS store_id, 2 AS product_id, 80 AS sales
      ) AS raw
      GROUP BY 1, 2

    joins:
      - name: nested_agg_sales
        sql: "{nested_agg_stores_orders.store_product_id} = {nested_agg_sales.store_product_id}"
        relationship: one_to_many

    dimensions:
      - name: store_id
        sql: store_id
        type: number

      - name: product_id
        sql: product_id
        type: number

      - name: store_product_id
        sql: "CONCAT({store_id}, '-', {product_id})"
        type: string
        primary_key: true

      - name: sales_sum
        sql: "{nested_agg_sales.sales}"
        type: number
        sub_query: true

    measures:
      - name: median_sales
        sql: "PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY {sales_sum})"
        type: number
```

As you can see, the sum of sales for per store and per product is defined
in the `nested_agg_sales` cube as the `sales` measure. Then, it is brought
to the `nested_agg_stores_orders` cube as `sales_sum` that is defined as
a subquery dimension. Also, a join is defined between both cubes.

Then, the median of sales is defined as the `median_sales` measure in the
`nested_agg_stores_orders` cube. It’s OK to reference `sales_sum` in this
measure because now it's a dimension; referencing a measure from another
cube here would not work.

## Result

Querying the `median_sales` measure would give the expected result:

<Screenshot src="https://ucarecdn.com/2346d4df-841a-4aa7-9cdf-ad4eba35dd15/"/>

We can verify that it's correct by adding one more dimension to the query:

<Screenshot src="https://ucarecdn.com/6557aa71-6035-4c80-a3d3-ce772a32f867/"/>

[ref-measures]: /reference/data-model/measures
[ref-cube]: /reference/data-model/cube
[ref-subquery-dimension]: /product/data-modeling/concepts/calculated-members#subquery-dimensions